Good morning/afternoon. Today, we're exploring the relationship between these two tests.
I can mic drop right here. You can leave the room now.
# LIBRARIES
library(readr)
library(dplyr)
Flashback to the 60's.
chd69=1
implies that a CHD event occurred vs. chd69=0
codes no CHD event.dibpat0=1
codes participants with a "Type A" personality and dibpat0=0
codes participants with a "Type B" personality.# READ IN DATA
dat <- read_csv("data/lab_11.csv")
# PREVIEW DATA
head(dat)
Recognize that we're testing for independence.
Because we're working with a 2x2 contingency table (as you'll see soon), the hypotheses narrow down to the same thing.
$H_0$: P(CHD=1|Type A) = P(CHD=1|Type B)
$H_1$: P(CHD=1|Type A) $\neq$ P(CHD=1|Type B)
# THE OVERALL PROPORTION
# OFTEN CALLED "POOLED"
overall_p <- dat %>%
summarize(overall_p = mean(chd69),
se = sqrt(overall_p*(1-overall_p)*(1/100 + 1/100)))
overall_p
# CALCULATE THE POPULATION STATS
summary_stats <- dat %>%
group_by(dibpat0) %>%
summarize(n = n(), propCHD = mean(chd69))
# BASE OUR TEST OFF OF THIS
summary_stats
Using the above values, we calculate our z-statistic in the form of: "Proportion of Two Populations" from the bCourses Statistical Inference Reference Sheet.
# Z-TEST STATISTIC
z_stat <- (0.16 - 0.03) / 0.04146685
z_stat
# TWO-SIDED
p_value <- pnorm(q = z_stat, lower.tail = F)*2
p_value
Now that we've seen the machinery... take a shortcut.
# WOW, I PREFER THIS
prop.test(x = c(3, 16), n = c(100, 100), correct = F)
two_way <- matrix(c(3, 97, 16, 84), byrow=TRUE, nrow=2)
two_way
row.names(two_way) <- c("type a", "type b")
colnames(two_way) <- c("chd=1", "chd=0")
two_way
We need to calculate marginals.
totals_1 <- c(3+97, 16+84)
totals_2 <- c(3+16, 97+84, 3+97+16+84)
two_way <- rbind(cbind(two_way, totals_1), totals_2)
two_way
Get the "$E_i-O$"'s.
ei_rows <- c(19*100/200, 181*100/200)
Note that the rows won't always be identical. This is just the case because we have an even amount of samples in each category.
expected_counts <- rbind(ei_rows, ei_rows)
expected_counts
two_way[1:2,1:2] - expected_counts
We will construct the statistic as we see in the reference page.
# CHI-SQ TEST STATISTIC
sum((two_way[1:2,1:2] - expected_counts)^2 / expected_counts)
We can use the R function now.
chisq.test(two_way, correct=FALSE)
The relationship between the statistics:
z_stat <- 3.13503437082875
x_stat <- 9.8284
z_stat^2
All done!